NADA: A Robust System for Non-referential Pronoun Detection
نویسندگان
چکیده
We present Nada: the Non-Anaphoric Detection Algorithm. Nada is a novel, publicly-available program that accurately distinguishes between the referential and non-referential pronoun it in raw English text. Like recent state-of-the-art approaches, Nada uses very large-scale web N-gram features, but Nada makes these features practical by compressing the N-gram counts so they can fit into computer memory. Nada therefore operates as a fast, stand-alone system. Nada also improves over previous web-scale systems by considering the entire sentence, rather than narrow context windows, via long-distance lexical features. Nada very substantially outperforms other state-of-the-art systems in nonreferential detection accuracy.
منابع مشابه
Distributional Identification of Non-Referential Pronouns
We present an automatic approach to determining whether a pronoun in text refers to a preceding noun phrase or is instead nonreferential. We extract the surrounding textual context of the pronoun and gather, from a large corpus, the distribution of words that occur within that context. We learn to reliably classify these distributions as representing either referential or non-referential pronou...
متن کاملThe Referential Versus Non-referential Use of the Neuter Pronoun in Dutch and English
This paper discusses a corpus-based investigation of the distribution of the thirdperson neuter singular pronoun in Dutch (“het”). We labeled all pronominal occurrences of “het” in a large corpus of documents. On the basis of the annotated corpora, we developed an automatic classification system using machine learning techniques to distinguish between the different uses of the neuter pronoun. A...
متن کاملDisambiguation of the Neuter Pronoun and Its Effect on Pronominal Coreference Resolution
Coreference resolution, determining the appropriate discourse referent for an anaphoric expression, is an essential but difficult task in natural language processing. It has been observed that an important source of errors in machine-learning based approaches to this task, is the wrong disambiguation of the third person singular neuter pronoun as either referential or non-referential. In this p...
متن کاملSupervised Ranking for Pronoun Resolution: Some Recent Improvements
A recently-proposed machine learning approach to reference resolution — the twin-candidate approach — has been shown to be more promising than the traditional single-candidate approach. This paper presents a pronoun interpretation system that extends the twin-candidate framework by (1) equipping it with the ability to identify non-referential pronouns, (2) training different models for handling...
متن کاملHow Far Are We From (Semi-)Automatic Of Anaphoric Links In Corpora?
The paper raises for discussion a proposal for the semi-automatic annotation of pronoun-antecedent pairs in corpora. The proposal is based on robust knowledge-poor pronoun resolution followed by post-editing. The paper is structured as follows. The introduction comments on the fact that automatic identification of referential links in corpora has lagged behind in comparison with similar lexical...
متن کامل